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Abstract 

Many real- world problems modeled by stochastic games have huge 
state and/or action spaces, leading to the well-known curse of dimen- 
sionality. The complexity of the analysis of large-scale systems is dra- 
matically reduced by exploiting mean field limit and dynamical system 
viewpoints. Under regularity assumptions and specific time-scaling 
techniques, the evolution of the mean field limit can be expressed in 
terms of deterministic or stochastic equation or inclusion (difference or 
differential). In this paper, we overview recent advances of large-scale 
games in large-scale systems. We focus in particular on population 
games, stochastic population games and mean field stochastic games. 
Considering long-term payoffs, we characterize the mean field systems 
using Bellman and Kolmogorov forward equations. 
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1 Introduction 

Dynamic Game Theory deals with sequential situations of several decision 
makers (often called players) where the objective for each one of the players 
may be a function of not only its own preference and decision but also of 
decisions of other players. 

Dynamic games allow to model sequential decision making, time-varying 
interaction, uncertainty and randomness of interaction by the players. They 
allow to model situations in which the parameters defining the games vary 
in time and the players can adapt their strategies (or policies) according the 
evolution of the environment. At any given time, each player takes a deci- 
sion (also called an action) according to a strategy. A (behavioral) strategy 
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of a player is a collection of history-dependent maps that tell at each time 
the choice (which can be probabilistic) of that player. The vector of actions 
chosen by players at a given time may determine not only the payoff for each 
player at that time; it can also determine the state evolution. A particu- 
lar class of dynamic games widely studied in the literature is the class of 
stochastic games. Those are dynamic games with probabilistic state tran- 
sitions (stochastic state evolution) controlled by one or more players. The 
discrete time state evolution is often modeled as interactive Markov decision 
processes while the continuous time state evolution is referred to stochastic 
differential games. Discounted stochastic games have been introduced in [29J. 
Stochastic games and interactive Markov decision processes are widely used 
for modeling sequential decision-making problems that arise in engineering, 
computer science, operations research, social sciences etc. However, it is well 
known that many real-world problems modeled by stochastic games have 
huge state and/or action spaces, leading to the well-known curse of dimen- 
sionality that makes solution of the resulting models intractable. In addition, 
if the size of the system grows without bound, the number of parameters: 
states, actions, transitions explode exponentially. 

In this paper we present recent advances in large-scale games in large- 
scale systems. Different models (discrete time, continuous, hybrid etc) and 
different coupling structures (weakly or strongly) are presented. Mean field 
solutions are obtained by identifying a consistency relationship between the 
individual-state-mass interaction such that in the population limit each indi- 
vidual optimally responds to the mass effect and these individual strategies 
also collectively produce the same mass effect presumed initially. In the finite 
horizon case, this leads to a coupled system forward/backward optimality 
equations (partial differential equation or difference equations). 

1.1 Structure 

The remainder of the paper is structured as follows. In the next section we 
overview the mean field model description and its wide range of applications 
in large-scale wireless networks. We then focus on different mean field cou- 
pling formulation. After that we present mean field related approaches. The 
novelties of the mean field systems are discussed. 
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1.2 Notations 

We summarize some of the notation used in the paper in Table [TJ 

Table 1: Summary of Notations 



Symbol Meaning 

ft drift function (finite dimensional) 

a t diffusion function at time t 

x™ t state of player j in a population of size n 

q xux ' 7 t transition probability at time t 

M™ mean field process 

C xx ij(u, m) transition kernel of the population profile 

Xj.t limit of state process x^ t 

r t instantaneous payoff function 

gx terminal payoff function 



2 Overview of large-scale games 
Population games 

Interactions with large number of players and different types can be described 
as a sequence of dynamic games. Since the population profile involves many 
players for each type or class and location, a common approach is to replace 
individual players and to use continuous variables to represent the aggregate 
average of type-location- actions. The validity of this method has been proven 
only under specific time-scaling techniques and regularity assumptions. The 
mean field limit is then modeled by state and location-dependent time pro- 
cess. This type of aggregate models are also known as non-atomic or popu- 
lation games. It is closely related to von Neumann (1944) and mass-action 
interpretation in Nash (1951). In the context of transportation networks, 
interactions between continuum of players have been studied by Wardrop 
(1952) in a deterministic and stationary setting of identical players. 
In finite game, a (Nash) equilibrium is characterized by Vj, 

{xj G Xj, mj tXj > 0} = support (mj) C arg max rj(e x >.,m^j) 
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where rj(.) denotes payoff of j, Xj its action space and rrij its randomized 
action, m_j = (wj')j'Yj- 

In the infinite population game, a (Nash) equilibrium is characterized by 
a fixed inclusion: the support of the population profile is included in argmax 
of the payoff function, 

{x e X, m x > 0} = support(m) C arg max r x < (m) . 

x'&X 

In other words, if the fraction of players under a specific action is non-zero 
then the payoff of the corresponding action is maximized. This large-scale 
methodology has inherent connections with evolutionary game theory when 
one is studying a large number of interacting players in different subpop- 
ulations. Different solution concepts such as evolutionarily state states or 
strategies, neutrally stable strategies, invadable states have been proposed 
and several applications can be found in evolutionary biology, ecology, control 
design, networking and economics. 

Overview of mean field stochastic games 

We briefly present related works on mean field stochastic games. 

• Discrete time mean field stochastic games with continuum of players 
have been studied by [19] under the name anonymous sequential games. The 
authors considered the evolution of the mean field limit in the Bellman dy- 
namic programming equation. The work in [19] shows, under suitable condi- 
tions, the existence of such mean field equilibria in the case where the mean 
field limit of players' characteristics evolves nonstochastically. The authors in 
[5] showed how stochastic mean field limit can be introduced into the model 
(so the mean field limit evolves stochastically). 

• Decentralized stochastic mean field control and Nash Certainty Equiva- 
lence have been studied in [T71 HU HU [151 US] f° r large population stochastic 
dynamic systems. Inspired by mean field approximations in statistical me- 
chanics and linear quadratic Gaussian (LQG) differential games, the authors 
analyzed a common situation where the dynamics and payoffs (costs, reward, 
utility) of any given agent are influenced by certain aggregate of the mass 
multi-agent behaviors and established the existence of optimal response to 
mean field under boundedness and regularity assumptions. In the infinite 
population limit, the players become statistically independent under some 
technical assumptions on the control laws and the structure of state dy- 
namics, a phenomenon related to the propagation of chaos in mathematical 
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physics. In [16], the authors extended the LQG mean field model to non- 
linear state dynamics and non-quadratic case for localized and multi-class of 
players. LQG hybrid mean field games have been considered in [43] . 

• In [27J, [25], [26] a mathematical modeling approach for highly dimen- 
sional systems of evolution equations corresponding to a large number of 
players (particles or agents) have been developed. The authors extended the 
field of such mean-field approaches also to problems in economics, finance 
and game theory. They studied n-player stochastic differential games and 
the related problem of the existence of equilibrium points, and by letting n 
tend to infinity they derived the mean-field limit equations such as Fokker- 
Planck-Kolmogorov (FPK) equation coupled with the mean field version of 
Hamilton- Jacobi-Bellman-Fleming (HJBF). Applications to finance can be 
found in [12]. The authors in [6j [7J [42] extended the framework to mean field 
stochastic differential games under general structure of drift and noise func- 
tion but also with major and minor players. The authors in [10, 24] applied 
mean field games to crowd and pedestrian dynamics. Numerical methods for 
solving backward-forward partial differential equations can be found in pQ. 

• Discrete time models with many firm dynamics have been studied by 
[4"5l [44] using decentralized strategies. They proposed the notion of oblivi- 
ous equilibria via a mean field approximation. Extension to unbounded cost 
function can be found in [2j. In [3], a mean field equilibrium analysis of 
dynamic games with complementarity structures have been conducted. In 
[541 I4U] , models of interacting players in discrete time with finite number of 
states have been considered. The players share local resources. The players 
are observable only through their own state which changes according to a 
Markov decision process. In the limit, when the number of players goes to 
infinity, it is found that the asymptotic system is given by a non-linear dy- 
namical system (mean field limit). The mean field limit can be in discrete 
or in continuous time, depending on how the model scales with the number 
of players. If the expected number of transitions per player per time slot 
vanishes when the size of the system grows, then the limit is in continuous 
time. Else the limit is in discrete time. Markov mean field teams have been 
studied [34], Markov mean field optimization, controls and Markov decision 
processes have been studied in [36]. Connection of the resulting limiting 
mean field games to anonymous games or stochastic population games have 
been established. A stochastic population game given by a population profile 
which evolves in time, internal states for each player and a set of actions in 
each state and population profile. The expected payoff of the player are com- 
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pletely determined by the population profile and its current internal state. At 
the continuum limit of the population, one can have (i) a discrete time mean 
field games which cover the so-called anonymous sequential games or (ii) a 
continuous time mean field games leading the so-called differential population 
games. The corresponding limiting games fall down to 

(i) Differential population games in which the optimality criteria leads an 
extended HJBF coupled with FPK equations or, 

(ii) Anonymous sequential games in which the leading dynamics are mean 
field version of Bellman-Shapley equations combined with discrete time mean 
field Kolmogorov forward equations similar to the prescribed dynamics de- 
veloped by [19J. 

Networking applications 

Below we present the relevance of large-scale games in large-scale networks. 
Due to the limitations of the classical perfect simulation approaches in pres- 
ence of large number of entities, mean field approach can be more appropriate 
in some scenarios: 

MFSG and continuum modeling 

The simulation of multiple networks and their statistical modelling can be 
very expensive, whereas solving a continuum equation such as partial dif- 
ferential equation can be less expensive in comparison. Example of such 
large-scale systems include: 

• Internet of things with 2 billions of nodes, 

• Network of sensors deployed along a volcano, collecting large quantities 
of data to monitor seismic activities where transmissions are from relay- 
node to relay-node until finally delivered to a base station 

• Disruption-tolerant networks with opportunistic meeting in a large pop- 
ulation of 20.000.000 nodes 

Opportunistic interaction under random mobility: 

The work in [TUl 121] has modelled crowd behavior and pedestrian dynam- 
ics using a mean field approach. Inspired from [10], one can get a random 
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mobility model for the users. In [SB] an application to malware propaga 



tion in opportunistic networking have been studied. This example illustrates 
how mean field game dynamics can be useful in describing the network dy- 
namics in absence of infrastructure, low connectivity and in absence of fixed 
routes to disseminate information. The model has been extended to Brown- 
ian mobility of players with communication distance parameter and energy 
saving in wireless ad hoc networks. A challenging problem of interest such 
in configuration is a routing packet over the wireless network from sources 
to destinations (their locations are unknown and they can move randomly). 
The wireless random path maximizing the quality of service with minimal 
end-to-end delay from a source to a destination changes continuously as the 
network traffic and the topology change. An expected element characterizing 
the network state (mean field) and mean field learning-based routing proto- 
col are therefore needed to estimate the network traffic and to predict the 
best network behavior. 



MFSG for carrier sense multiple protocols: 

The mean field stochastic game approach has potential applications in wire- 
less networks (see [2] and the references therein). Mean field Markov mod- 
els have been studied in details in [9j [8] for Carrier Sense Multiple Access 
(CSMA)-based IEEE 802.11 Medium Access Control (MAC) protocols and 
gossip protocols. When the strategies of the users are taken into consider- 
ation, one gets interdependent decision processes for the backoff stage: The 
backoff process in IEEE 802.11 is governed by a Markovian decision process 
if the duration of per-stage backoff is taken into account: 

• every node in backoff state xg attempts transmission with probability 

• if it succeeds, the backoff state changes to 0; 

• otherwise, the backoff state changes to (xq + 1) mod (Kg + 1) where 
Kg is the index of the maximum backoff state in class 8. 

Extension to SINR-based admission control and quality of service (QoS) 
management with backoff state can be found [36J. 
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Mean field power allocation 

In [36] the authors study a power management problem using mean field 
stochastic game. The mean field approach have been applied to dynamic 
power allocation (vector) in green cognitive radio networks. The authors 
showed that if the players react to the mean field and, if the size of the 
system is sufficiently large then decentralized mean field power allocations 
can be approximated equilibria. 

MFSG for energy market in smart grid, chemical reaction and water com- 
position and molecular mobility can be found in [36] 

3 Basics of MFSG models 

In this section we overview basics of mean field stochastic game (MFSG) 
models. 

3.1 Weakly coupling 

Weakly coupling via the payoff functions The players are weakly cou- 
pled only via the payoff functions if the individual state dynamics are not 
directly influenced by the others states and strategies i.e 

■'I.! • : f j.! (-'I.! -"J.!-"-'.,) (1) 

where x™ t is the state of player j, f™ t is a deterministic function, u™ t is the 
action/control of player j and w™ t is a random variable (independent to the 
state and the action processes of others) with transition probabilities given 
by 

P(xt+i G X\x™, u™ t , . . . , w™ , x^ Q ), 

where X is a subset of X . The instantaneous payoff function of player j may 
depend on the state and/or actions of the others or the state mean field 
\ Sj=i H{x™ t =z} or the state-action mean field 

1 - 

~H 11 {(^ n t ,^ t )=^-«)} 
n j=1 

or the population profile process ^ Y^=i $x™ t , etc. 
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Note that in dynamic environment, the players may not interact all the 
time with the same set of neighbors. Some players may be active or inactive, 
some new player may join or leave the game temporary etc. Then the payoff 
function depends on the state and also the actions of all the players that 
she/he meets during the long-run interaction. 

A simple continuous time version of the above state dynamics is the fol- 
lowing ltd stochastic differential equation (SDE) 



dx\ t 



(2) 



where a™ t is the variance function and /™ 4 is the drift function for player j 
at time t and Bj is a standard Brown motion (Wiener process). An example 
of such dynamics is dx™ t = u'j t dt + a^ t dBj :t 

How the payoff depends on the mean field? When the number of players is 
very large, the payoff function can be expressed in function of the mean field 
under technical conditions. Here is a simple example. Let the instantaneous 
payoff functions be in the following form 



'j,t 



n 7 



^ r jA X 7,t' U j,ti X i,t)- 



Let recall that for any measurable bounded function <f> defined over the state 
space, one has 



/ 0H [-E^J (dw) = [ (P(w)Mf(dw) = -X>« t ) 
Thus, the instantaneous payoff function is 



(3) 



rl t (xl t ,ul t ,MD 



rl t {xl t ,ul v w)K{dw) 



The long-term payoff function can be in finite horizon or in infinite horizon 
with discount factor or not. 

Weakly coupling via the individual states Here we focus on the case 
where the players are only weakly coupled via the individual states. In this 
case, the payoff functions of each player depends only its own state and own 
strategy but also his/her state is influenced by the other players states and 
actions. 

An example of such discrete time dynamics is 



x 



"It+i - f?A x j,ti u j,t> x -j,t> u -j,ti w j,t) 



(4) 
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where transition kernel of w™ t depends on the states and the actions of the 
others: P(.|a£, ^o) where x -j,t = ( x j',t)j'^j^ x t = ( x lt)j^ u t = 

An example of continuous time version is 



which covers the following dynamics: 



(5) 



d x ],t — n E fj l ,t( x ],ti u ],ti x i,n u i,t)dt 



i=l 



+-^U^ u h x l,<t)dB 3 , t 



n 



(6) 



i=l 



The case where f? t and a™ t depend only the state are well-studied. Then, 
the averaging structure becomes 



dx\ t 



i n i n 

E JlA-'-p- «].!■ x tt)dt ■ -y ■>■';,■ u], t , xi t )dB ht (7) 



n 



i=l 



n 



i=i 



The last equation can be written in function of the mean field M™ 

n '—'3 — 1 j' ,t 



dr n 



% t (xl t ,ul t ,w)M?(dw) 



dt 



al t (xl t ,ul t ,w)M?(dw) 



dB ht 



(8) 



For discrete time models, the similarity with the above methodology can 
be done in the transition probabilities. Another way is to consider directly 
the model in which the probabilities depend on the fraction of players with 
specific state by considering ^ J2j=i ^{x n t =x}- If the transitions depend only 
on a local mean field then it can written as a function of mean field seen from 
that player. 

Weakly coupling via neighborhoods Consider the individual dynam- 
ics in the form: 

dx^ t = Z^eA/} w ij'(^)/^,t( a 'j,t5 u j,t> x l,f> u l,t)dt 
+ Y^ieNj ^HA^ u lt, x lu ul t )dBj, t , 

x% e x c R k , k > l 
j e{l,2,...,n},9 j e<3 
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where coefficient oofjt) > represents the influence of player i to player j at 
time t. Then, player j has its own local mean field limit M™ t := X^eAG ^jAa™ t >«? ) 
where n is the number of players, x™ t is the state of player j, u™ t is the control 
of player j, Bj is a standard Brown motion (Wiener process), the coefficients 
are normalized such that 

Then cjg = can be interpreted as the case where player % does not affect 
the state dynamics of player j. The term 9j is the type of the player j. 6 is 
the set of types. 

Then, under suitable conditions, the asymptotic of a subsequence of the 
individual state dynamics lead to macroscopic McKean-Vlasov equation with 
local mean field limit under the form: 

dx j,t = fw> fe j ,t{ x 3,^ u 3^ w ') m jit (dw')dt 
+ L> a e 3 A x o^ Uj,t, W) m jjt (dw')dB jjt , 
x] G X C R k , k > 1 

Uj >t e U 6j 

Note that the processes rrij t are interdependent and their laws can be ob- 
tained as a solution of coupled systems of Fokker-Planck-Kolmogorov equa- 
tions. Moreover, the convergence rate is in order of 0{-^ + e°) where e° 
captures the initial estimates and the gap at the initial distributions. We 
refer the reader to [36J for more recent discussions on the convergence issue. 

3.2 Strongly coupling 

Here the state evolutions and the payoff functions depend on the state and/or 
the strategies of some of the other players. Typically, most of games with 
variable number of interacting during time fall down in the class of strongly 
coupling mean field interaction. For example, the instantaneous payoff 

and the state dynamics 

x] eX CR k , k > 1 
j e{l,2,...,n},9 j ee 
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lead to a strongly coupling mean field interaction. 



4 What is new? 



The novelties of the MFSG approach are in the characterization of the mean 
field optimalit;y0. Theses optimality equations differ from the classical dy- 
namic games and dynamic programming principles. 



4.1 Discrete time 

In the mean field stochastic Markov game modeling in discrete time, there 
must be an equation to express the dynamic optimization problem of each 
player. Usually this involves one equation for each player. If players are clas- 
sified together by similar player types, there is one equation per type. This 
equation is generally a Bellman-Shapley equation, since a large proportion of 
dynamic optimization problems with perfect state observation fall within the 
framework of dynamic programming. Hence, the Bellman-Shapley equations 
will be used to compute optimal behavioral strategies. An equation is also 
needed to express the subpopulations' behavior, the mean field behavior of 
each type. The dynamics of the distribution is governed by a Kolmogorov for- 
ward equation. In the Kolmogorov forward equation, the optimal behaviors 
of the players occur as data, since it is the infinite collection of individual be- 
haviors that is aggregated and constitutes collective behavior by consistency. 
Thus, the modeling of the behavior of a group of players naturally leads to 
a BS-K (Bellman-Shapley and Kolmogorov) system of equations. The dis- 
crete BS-K have been studied by Jovanovic & Rosenthal in the eighty's. The 
novelty in their study is that the mean field games formalism involves the 
density of players on the state space can enter in the Bellman-Shapley equa- 
tion. Thus, the mean field equilibrium is defined by an BS-K system in which 
the Bellman-Shapley equations are doubly coupled: individual behaviors are 
given for the Kolmogorov forward equation and, at the same time, the dis- 
tribution of players in the state space enters in the Bellman equation which 
is completely innovative. This means that players can incorporate into their 
preferences the density of states/actions of other players at the anticipated 

1 Note that "mean field optimality" refers to response to a consistent mean field. It is 
not necessarily optimal in the finite regime. 
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equilibrium. Therefore each player can construct his strategy by taking ac- 
count of the anticipated distribution of strategies and of the actions of other 
players. Under suitable conditions, this fixed-point of behaviors, the mean 
field equilibria can be defined by moving to the limit on the number of play- 
ers in the class of Markov games in discrete time (or difference games) that 
are asymptotically invariant by permutation within the same type of players 
called Asymptotic Indistinguishability Per ClasQ 

4.2 Continuous time 

In the continuous time model, the Hamilton- Jacobi-Bellman-Fleming (HJBF) 
equation will replace the Bellman equation and the Kolmogorov forward 
equation becomes a Fokker-Planck-Kolmogorov (FPK) equation. We then 
get a coupled system of partial differential equations (PDEs). In addition, in 
presence specific player such major player, its individual state dynamics at 
the limit regime should be added to the system. Then, the question of ex- 
istence, uniqueness, regularity, and performance bounds arise for the system 
of PDEs. See the mean field games (MFG) lectures by Lions at College de 
France. 

4.3 Connection between the mean field models 

The reader may ask what is the connection between all the above mean field 
models. 

Is there a connection between the discrete time Markov model and the 
mean field differential game model? 

The authors in [36J give a partial answer to this question. Under particu- 
lar structure of payoff functions and probability transitions of the mean field 
stochastic population game model one can get a mean field differential game 
at the limit for vanishing intensity of interactions. This establishes a first 
connection from discrete time to continuous time mean field model. Next, 
we need to show that the convergence of subsequences of optimal strategies 
and optimal payoffs under the Bellman-Shapley's equation to the Hamilton- 
Jacobi-Bellman equation under mean field dynamics. The authors provided 
sufficient conditions for mean field stochastic games with random number 

2 Thcsc assumptions follow the line of the works by de Finetti (1931), Hewitt & Savage 
(1955), Aldous (1983), Sznitman (1991), Graham (2000), Tanabe (2006), McDonald (2007) 
etc. 
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of interacting players for mean field convergence to stochastic differential 
equations. Their techniques for the mean field optimality criterion combine 
Ito-Dynkin's formula with stochastic maximum principle. 

A second connection can be obtained by considering mean field stochastic 
difference game. Under specific time-scales, one show that the discrete time 
mean field stochastic game converges to a mean field stochastic differential 
game characterized by a non-linear macroscopic McKean-Vlasov, Fokker- 
Planck-Kolmogorov and HJBF equations. 

Following the same setting, one can design numerical scheme of the Ito 
stochastic differential to move from differential mean field model into differ- 
ence mean field model. But still one needs to show that the strategies, the 
values, e— Nash properties holds under these scaling schemes because these 
properties depends mainly on the proposed scheme for the time-derivative 
and integration of the partial differential equations (PDE). 

5 Mean field related approaches 

In this section we present mean field related approaches. 

5.1 Connection to mathematical physics 

There are connections between exact microscopic models that govern the 
evolution of large particle systems and a certain type of approximate mod- 
els known in Statistical Mechanics as mean field limit. This notion of mean 
field limit is best understood by getting acquainted with the most famous ex- 
amples of such equations inspired from physics. The particle system model 
describes the evolution of a generic player (particle) subject to the collec- 
tive interaction created by a large number n of other players (particles). 
The state of the generic player is then given by its phase space density; 
the force field exerted by the n other players on this generic player is ap- 
proximated by the average with respect to the phase space density of the 
force field exerted on that particle from each point in the phase space. A 
number of models have been studied in the literature. Those are McKean- 
Vlasov equation, Fokker-Planck equations, mean-field Schrodinger equation, 
Hartree-Fock equation, Bergers equation, Boltzmann equations, transport 
equations, continuity equations etc. 
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Incorporation of controls in these models give controlled mean field equa- 
tions. If in addition a dynamic optimization setting were present, one gets a 
large-scale dynamic game. 

5.2 Connection to evolutionary dynamics 

The paradigm of evolutionary game dynamics has been to associate relative 
growth rate to actions according to the expected payoff they achieved, then 
study the asymptotic trajectories of the state of the system, i.e. the frac- 
tion of players that adopt the different individual and actions. The works in 
[231 I5U| IU derive mean field game dynamics for multiple-type population 
games. These mean field game dynamics are generalization of evolution- 
ary game dynamics (deterministic or stochastic). For large populations with 
finite number of states and/or actions in X, the standard deterministic evo- 
lutionary game dynamics based on revision protocols are in the form 

m{x) = C xxl (m t )m t (x) -m t (x) ^ C x ' x (m t ) (9) 
x'ex x'ex 

which is a specific Kolmogorov forward equation. The term C xx i represents 
a rate transition from x to x' . 

This equation can be obtained from the drift limit and single selection 
per time unit without control parameter ([281 E3 E32] )■ By specifying the 
transitions probabilities C, one gets Replicator dynamics, Best-response dy- 
namics, Smith dynamics, Brown-von Neumann-Nash dynamics, Orthogonal 
projection dynamics, Target projection dynamics, Ray-projection dynam- 
ics, Smooth best response dynamics, Imitative Boltzmann-Gibbs dynamics, 
Multiplicative weight imitative dynamics, Generalized pairwise comparison 
dynamics, Excess payoff dynamics, "Imitate the better" dynamics etc. See 

[ZD 1221 1201 

5.3 Connection to the propagation of chaos 

If the mean field stochastic games model satisfies the invariance in law by any 
permutation with players index within the same type under specific controls 
u that preserve this property, one can use the exchangeability per class or 
indistinguishability per class [UJ to establish a propagation of chaos [3T| 132]. 
Let x" = (a^)t>o- Then, the process A n = i X)£=i S x « converges in law to a 
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random process m with law /x is equivalent to the so-called fi— chaoticity: for 
any integer k, any measurable and bounded functions <f>i, . . . , (f>k 

(k \ k 

nw) =n(/ </>»m<m) (io) 

Non-commutative diagram Consider a population with n players. Denote the 
mean field by M" = J2]=i 0J j'^x n t where x™ t is the state of player j at time t 
and to™ is the weight of player j in the hull population of size n. Then, given 
a initial condition m , denote by M"[w, m ] the process M t n starting with the 
distribution given by m at time subject to the control u. The study of the 
process M^[u,m Q ] is summarized in the following diagram: 



M?[u,mo] 



+00 



zu n [u, mo] 



n 



+00 



m t [u,m ] 



+00 



n 



+00 



If the limits are well-defined, we call w n = lim t M™ and m t = lim n M t n . 
Then, the question is on the double limit i.e the commutativity of the dia- 
gram. 

It turns out that the double limit can be different. The diagram is not 
always commutative. 

limlimM™ ^ limlimM™. 



This phenomenon is in part due to the fact that the stationary distribution 
of the process w n is unique under irreducibility conditions and the dynamics 
of m t may lead to a limit cycle. As a consequence, many techniques and 
approaches based on stationary regime (such as fixed point techniques, lim- 
iting of frequencies state-actions approaches in sequence of stochastic games, 
replica methods, interacting-particle systems etc) need some justification. 
This difference in the double limits (the non- commutativity phenomenon) 
suggests to be careful about the use of stationary population state equilibria 
as the outcome prediction and the analysis of equilibrium payoffs since this 
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equilibrium may not be played. Limit cycles are sometimes more appropriate 
than the stationary equilibrium approach. 

The convergence to an independent and identically distributed system 
is sometimes referred to chaoticity, and the fact that chaoticity at the ini- 
tial time implies chaoticity at further times is called propagation of chaos. 
This diagram says that, in general the chaoticity property may not holds 
in stationary regime. This means that two randomly picked players in the 
population may be correlated. 

We mention a particular case where the rest point m* is related to the 
8 m *— chaoticity. If the mean field dynamics of m t has a unique global at- 
tractor m* then the propagation of chaos property holds for the measure 
5 m * . Beyond this particular case, one can have multiple rest points but also 
the double limit lim n linit Mf may differ from lim 4 lim n Mf leading a non- 
commutative diagram. Thus, a deep study of the dynamical system is re- 
quired if one want to analyze a performance metric for a stationary regime. 
A counterexample of different double limits is provided in [36] . 

5.4 Weak convergence 

• de Finetti-Hewitt- Savage Consider a complete separable metric space X and 
a sequence of random processes (x™)j,n> satisfying the indistinguishability per 
class property i.e invariance in law of permutation within the same type/class. 
Then, the population profile M n converges weakly to a random measure m. 
Moreover, conditionally to m, one has that for any integer k, any measurable 
and bounded functions <pi, . . . ,<ph defined over X, 




• Now we focus on the convergence of the pair (x^ t , M"). In the case 
where M" goes to a deterministic object m t , vanishing time-scales, it is shown 
in jlQ] that the pair (x" t , Mf) converges weakly to (xj t t,m t ) where x J)t is a 
continuous time jump and drift process (which depends on m) m t is a solution 
of an ordinary differential equation. 

5.5 Differential population game 

In this subsection we provide a mean field equilibrium characterization of 
the differential population game [31] where each generic player reacts to the 
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mean field for a finite horizon [0,T]. We first start by a payoff of the form 
f t (u, m). 

0) sup [g T {m T )+ / r tl {u t ',m t >) dt'} 

u Jt 

subject to the mean field dynamics 

m t = m Q + [ f t > {tit, m?) dt' . 
Jo 

We say the pair of trajectories {ul,ml) t > constitutes a consistent mean 
field response if u* t is an optimal strategy to be above problem (*) where ml 
is the mean field at time t and u* t produces the mean field m t [u*,m ] = m* t 

A consistent mean field response is characterized by a backward-forward 
equation 

v T {m) = grim) 
-d t v t = sup u \r t {u,m t ) + (V m v t , f t (u,m t )}} 
m t = m + Jq f tl (u* t ,,m t >) dt' 

where u* t is in argmax of f t (u, m t ) + (V m v t , ft{u, m)). 

Next, we consider a individual state-dependent payoff r t {x,u,m). Define 

fT 

F T {x, u, m) = g T (x T , m T ) + J r t >{x t , ,u t > ,m t >) dt' 
where gT is a terminal payoff. 

fT 

(**)sup [g T (x T ,m T ) + / rt{xt,u v ,m t >) dt] 

u Jt 

subject to the mean field dynamics 

m t = m + [ ft{ut,mt) dt' . 
Jo 

where the individual state x t = x t [u] is a continuous time Markov jump 
process under u. We denote by q the infinitesimal generator of x t [u}. See 
[HH El] for more details on the analysis of the process {xj )tl m t ). 

We say the pair of trajectories {u* t) m* t ) t >o constitutes a mean field equi- 
librium if {iit}t>o is a mean field response to be above problem (**) where ml 
is the mean field at time t and u* t produces the mean field mt[u*,mo] = ml 

Consider a differential population game problem with single type. Assume 
that there exists a unique pair (u*,m*) such that 
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(a) there exists a bounded, continuously differentiable function v x : [0, T] x 
M'^', v* t (m) = v t (x,m) and differentiable function m* : [0, T] — )> R'*', 

= m t [ii*,m ] solution to the backward- forward equation: 

v T (x,m) = g T (x,m), 
-d t v t (x, m) = sup u {r t (x, u, m) + (V m v t (a;, m), f t (u*, m)) 

+ J2x'ex Qxux' ,t( m ) v t ( x> , m ) } 
m t = m + Jq f t '(u*,, m t ,) dt 1 
xo = xEX,m E A(A') 

(b) u* t (x) maximizes of the function 

r t (x,u,m t ) + {V m v t {x,m),f t (u*,m t ))+ Qxux' ,t(m t )vt(x' , m) 

x'ex 

where q X ux',t{m) is the transition of the infinitesimal generator of xt under 

the strategy u and m, T, x ' Qxux',t( m ) = °> the term T, x 'ex Qxux',t( m ) v t( x ' i m ) 
is 

Qxux',t( m )( v t( x ^ m ) -v t (x,m)), 

x'^x 

m t [u* , mo] = m* t 

Then, {u* t ,m* t ) t >Q with ml = m t [u*,mo] constitutes a mean field equilib- 
rium and v*(m*) = v(x,m*) = F XjT (u*,m*). 

Similarly, for multiple types the systems becomes 

v e , T {ye,m) = ge, ye (m), 
-d t ve, t (x, m) = sup {r e , t (ye, u e , m)+(V m ve,t( x i m )> fo,t(u*, m)) 

Ug L 

+ E y ' g %gugy< g (m)ve,t(y' e ,m)} 
m 6jt = m efi + J* f e ,t'(u* t ,,m t ,) dt' 

ye,t = ye, m e A(X),6 e 0. 

Note that the applicability of this result is limited because in general the 
argmax may not be reduced to a singleton. 
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6 Mean field systems 



6.1 Transition kernels: 

In this subsection we briefly present the mean field systems. In the discrete 
time case, the BS-K equation for finite horizon T is given by 

m t +i( x ') = T, x 'mt{x)£xx>,t( u h m t) 
< Vi, x, a such that m t (x)u* t (a\x) > =>- 

_ a E argmaxb {r t (x, b, m t ) + £ x , v t +i{x\ m t+1 )q xbx >(u t , m t )} 

Under regularity and boundedness of the instantaneous payoff and the tran- 
sition kernels, the existence of solutions of the backward-forward system can 
be established using Kakutani-Glicksberg-Fan-Debreu fixed point theorem. 

6.2 Mean field Ito's SDE 

In this subsection we present the backward-forward system for mean field 
limit satisfying Ito's stochastic differential equation. The mean field system 
for horizon T in continuous time for a payoff in the form E (<7t(^t) + Jo r t{u t , m t ) 
is given by (McK-V-FPK): 

v T (m) = g T (m) 
-f t v t = sup UteU {r t (m t ,u t ) 

+\ E(x,x')eA* a xx ,j(m t , u t ) ^^Vt} 
d t m t + div (/ t (m t , u*)m t ) = \ Y. x ,x> dl x > (««',(( m <- u* t )m t ) 

mo E A(X). 

where f t is the drift and a t cr' t = a t . 

6.3 Stochastic difference games 

Consider the stochastic difference equation in K. : 

+ E"=i ^ij^A^q^lq^lq) (®j,q +1 ~ ®j,q) 
x™ = xj, tl = kd n , k > 0, 5 n > 0, lim n 5 n = 0. 
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where is a weight representing the influence of player to player j's state. 
We define the cumulative function F n as F n (t,w) = Y^=i ^{x^ t <w} where 

+n 



x T - t is the interpolated process from x™ 4 n . For any T < +00 there exists 



ct > such that 

E|| F(tl.)-F n (tl.) ||x< 5r 



F -F " ||i +— + J5 n 
>n v 



Moreover F(t, .) is the solution of 



d_ 

di 



ld_ 



d 



f s>t (x,u,w)-^F St (w)dw 







a St (x,u,w)d w F St (w)dw ) d^Fg t (x 



G G, m (.) fixed 



(12) 



(13) 
(14) 



See |38j for more details. The finite horizon cost function optimization leads 
to a coupled system of backward-forward equations: 



v jyT {xj,m) = g(xj,m) 
-d t v jjt = sup Uj {re^tix^Uj.m^t) + (f t (xj, Uj, m Jit ), d x v j)t ) 

d %e,t = L fe,t( x S,t, u lv w)mt(dw)dt + J w ag tt (x s , u* ft , w),m t (dw)dB t 

x = q 



1 d 2 

2 dx 2 



dt m 9,t + & [fe,t{x,u* t , mjmgj 

9e&, m (.) G A(X) 
ft = L fe,t( x e,t, u *§v w)m t {dw 



a1 t (x,u$,m t )m Sit 



6.4 Risk-sensitive mean field stochastic games 

A link between stochastic and deterministic mean field viewpoints is provided 
by considering risk-sensitive stochastic approach. The risk-sensitive approach 
consists to optimize the expectation E (g(R)) where R is the traditional long- 
term payoff function. The certainty-equivalent expectation e(R) is defined by 
g(e(R)) = (E(g(R))) . When g = e y>1 is exponential e(R) = ~g~ l (E(g(R))) = 
^ log (E, (^e^^j j . Consider the finite horizon payoff: 

- * sign(fx) logE f e ^T+i)+T,^ =t n'(^w,Mp)] 



22 



The intuitive view of the risk-sensitive criterion at zero is the following: 
Taylor expansion at ji close to zero leads 

R^ = E(R) + ^var(R) + o(^ 2 ) 

This means that the risk- sensitive criterion takes into consideration not only 
the expectation but also the variance! 

When jj, — y one gets the risk-neutral. Depending on the sign of fi, 
one gets the risk-seeking case or the risk-averse case. The analogue of BS-K 
becomes a multiplicative BS-K i.e a mean field version of the multiplicative 
Bellman- Shapley equation coupled with Kolmogorov equation. Denote by 
Vj^j the optimal payoff of player j with respect to m. 



e Lir t (xt,u,m t ) 



where 



{ v l»,t{ x t,™<t)) = max ueA( ^ ( , t)) 

Ex' qx t ux>A m t)g( v j,n,t+i( x ', m+i)) 

m t+ i(x') = Y.x&xmt{x)Cxx>{u* t ,rn t ) 



u* E argmaxe' iri(a;i ' M ' mt) ^g a;iM;r / )i (m t )^(t;* Mit+1 (x , ,m t+1 )). 



Considering individual state dynamics in the form of McKean-Vlasov, 

' dxl t = (j w f t (xl t , u n 3p w) [I Y2=i fyj (dw)) dt 
(j w a t (xl t , w) [I E"=i S x n t ] (dw)) dB jtt , 
x jfi eR k , k>l 
j e {l,2,...,ra}, 

and a risk-sensitive cost criterion Rj(v,j, M n ; t, Xj, m) 

= -logE ( e ^9T{xT)+^r B {x 3 ^ Uj , s ,M7) ds] | x ^ = x ^ M r, 



m 



We assume regular and bounded coefficients and their derivation with the 
respect to the states and Uj : [0, T] x R k — y Uj is piecewise continuous in t 
and Lipschitz in x. The mean field system becomes HJBF +Fokker-Planck- 
Kolmogorov equation + macroscopic McKean-Vlasov individual dynamics, 
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i.e., 

d%j,t = (l w ft(xj,t, u* j t , w)m t (dw)j dt 
(l w °t(xj,t, u* j>t , w)m t (dw)) dBjj, 

Xj q oc 

d t v j>t + sup Uj {d x v jjt .f t + \ir[a t a' t dl x v^ t ) 
+f || tr t d x v jtt || 2 +r t } = 0, 
Xj := x] Vj t r(x, m) = gr(x, m) 
d t m t + D l x (m t J w f t (x, u* , w)m t (dw)) 
= \ D lx ( m t (I w cr' t (x, u* , w)m t {dw)) ■ 
{J w <7t{x,u*,w)m t (dw))) 

Under specific structures of drift, payoff and volatility functions, existence 
result can be derived using fixed point theory. Also uniqueness issue can be 
addressed under monotonicity conditions. However, the existence and the 
uniqueness conditions of the above system under general structure remain a 
challenging problem. 

Here f t (.) G R k which we denote by {f k \t{))i<k'<k- Let 

a t [x,u* t ,m t ] = / a t (x,u* t ,w)m t (dw), 

J w 

r\(.) := <It(. i s a square matrix with dimension k x k. The term D x {.) 
denotes 

E x — \ mt / fk',t{x,ul,w)m t {dw)) , 

k , =1 OXy V Jw / 

and the last term on Dl x (.) denotes 

k k Q2 

E E * (jn t T k , k „ >t {.)) . 

k"=l k'=l (JXk'OXk" 

One can show that the asymptotic large deviations results as /i — > 0, are 
typically described through a risk-neutral mean field problem. This approach 
is closely related to large-deviation, H^— control, the minmax Hamiltonian 
of Isaacs and robust mean field stochastic game. Preliminary results can be 
found in |JT]. The model can be extended to include random switching (jump 
and drift process) and delayed state measurement. 
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6.5 Explicit solutions of MFSG 



There are few classes of mean field stochastic games in which explicit solu- 
tions has been found: 

• Mean field difference games with linear states and quadratic Hamilto- 
nian 

• Linear-quadratic mean field differential games, 

• Mean- Variance mean field differential games 

• Mean- Variance mean field difference games 

• Risk-sensitive mean field games with exponentiated long-term quadratic 
loss and linear dynamics. 

More details can be found in |36] . 
6.6 Other extensions 

• Extension to Poisson point processes, Levy flights, Feller processes etc. 

• Learning in large populations Assume that the strategy of each player is 
revised according to some dynamics which can be class- dependent drift and 
class-dependent diffusion terms. Then, the limiting of the learning process 
fall down into mean field PDE. When the diffusion is zero, one get the so- 
called continuity equation or transport equation. We refer the reader to 
[33| |35| [39] for recent developments on combined fully distributed payoff and 
strategy reinforcement learning (COD1PAS-RL). 

• Imperfect state measurement: Now, we assume that the state Xj }t is not 
observed by player j, but y^ t which is an output function of the state and 
noise. Under such situations, a fundamental question is: how to estimate the 
state under imperfect measurement using mean field stochastic games? 

• Mean field stochastic games with correlated populations, different types 
of players including major, minor and medium players, neighborhood based 
partial monitoring, hierarchical structure, and dynamic conjectural varia- 
tions. 

• Mean field cooperative games; mean field network formation games; 
mean field Stackelberg games, mean field Bayesian games etc. Mean field 
Q-learning, Mean field H-learning: heterogeneous, hybrid, cost of learning, 
random updates, noisy strategy in large-scale systems etc. 
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• Mean field games under fractional Brownian motion, anomalous diffu- 
sion (sub diffusion, superdiff usion) . 

7 Conclusions 

In this paper we have presented recent advances in mean field stochastic 
games, their applications as well as their connections to related field in large- 
scale systems. Below we point out some limitations and open issues for future 
works: 

• What about a system with small size (5, 7, 29, 31 players) ? 

• The curse of dimensionality problems are transformed into a condensed 
form (using localized density or aggregative terms). Are we able to solve 
the resulting continuum variables? What is the complexity in solving 
the continuum model? 

• Is there a performance loss by using mean field approach? What is the 
performance gap? 

• Beyond the indistinguishability per class property, what is the class of 
finite games for which the mean field approach can be applied? How 
big is this class of games compared to the set of all games? 
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